• Article  

      Application acceleration with the cell broadband engine 

      Shi, G.; Kindratenko, V.; Pratas, F.; Trancoso, Pedro; Gschwind, M. (2010)
      The Cell Broadband Engine is a heterogeneous chip multiprocessor that combines a PowerPC processor core with eight single-instruction multiple-data accelerator cores and delivers high performance on many computationally ...
    • Article  

      Balancing networks: State of the art 

      Mavronicolas, Marios (1997)
      Balancing networks have recently been proposed by Aspnes et al. (Proceedings of the 23rd Annual ACM Symposium on Theory of Computing, May 1991, pp. 348-358 as a new class of distributed, low-contention data structures ...
    • Article  

      Block Scheduling of Iterative Algorithms and Graph-Level Priority Scheduling in a Simulated Data-Flow Multiprocessor 

      Evripidou, Paraskevas; Gaudiot, J. -L (1993)
      While data-flow principles permit the utilization of large-scale multiprocessor systems with high programmability and good efficiency, they also introduce much overhead at runtime. In this paper, we have studied an important ...
    • Article  

      CacheFlow: Cache optimizations for data driven multithreading 

      Kyriacou, Costas; Evripidou, Paraskevas; Trancoso, Pedro (2006)
      Data-Driven Multithreading is a non-blocking multithreading model of execution that provides effective latency tolerance by allowing the computation processor do useful work, while a long latency event is in progress. With ...
    • Article  

      A combinatorial treatment of balancing networks 

      Busch, Costas; Mavronicolas, Marios (1996)
      Balancing networks, originally introduced by Aspnes et al. (Proceedings of the 23rd Annual ACM Symposium on Theory of Computing, pp. 348-358, May 1991), represent a new class of distributed, low-contention data structures ...
    • Conference Object  

      Comparison of techniques used for mapping parallel algorithms to message-passing multiprocessors 

      Dikaiakos, Marios D.; Steiglitz, Kenneth; Rogers, Anne (IEEE, 1994)
      This paper presents a comparison study of popular clustering and mapping heuristics which are used to map task-flow graphs to message-passing multiprocessors. To this end, we use task-graphs which are representative of ...
    • Conference Object  

      Contention in balancing networks resolved 

      Hadjimitsis, Leonidas; Mavronicolas, Marios (ACM, 1998)
      Counting networks have been originally presented by Aspnes et al. as a new class of distributed/coordinated data structures suitable for solving many fundamental, multi-processor coordination problems that can be expressed ...
    • Book Chapter  

      DART: a data-driven processor architecture for real-time computing 

      Farquhar, William G.; Evripidou, Paraskevas (Publ by Elsevier Science Publishers B.V., 1993)
      This paper presents the design of DART, a Data-driven processor Architecture for Real-Time computing. The DART processor is designed to be the key building block in real-time multiprocessor systems that can handle multiple ...
    • Conference Object  

      Extracting parallelism in Fortran by translation to a single assignment intermediate form 

      Barry, Robert J.; Evripidou, Paraskevas (Publ by IEEE, 1994)
      This paper presents MUSTANG, a system for translating Fortran to single assignment form in an effort to automatically extract parallelism. Specifically, a sequential Fortran source program is translated into IF1, a ...
    • Conference Object  

      Fault detection and recovery in a data-driven real-time multiprocessor 

      Farquhar, William G.; Evripidou, Paraskevas (Publ by IEEE, 1994)
      This paper introduces the mechanisms required to perform fault detection and recovery in the DART multiprocessor architecture. The DART multiprocessors uses prioritized data-driven scheduling to ensure that multiple hard ...
    • Article  

      Functional algorithm simulation of the fast multipole method: Architectural implications 

      Dikaiakos, Marios D. (1996)
      Functional Algorithm Simulation in a methodology for predicting the computation and communication characteristics of parallel algorithms for a class of scientific problems, without actually performing the expensive numerical ...
    • Article  

      Mapping fortran programs to single assignment semantics for efficient parallelization 

      Evripidou, Paraskevas (1998)
      This paper presents Mustang, a system that automatically parallellizes Fortran programs by mapping them to single assignment semantics. Specifically, sequential Fortran source programs are translated into IF1, a ...
    • Article  

      Memory assignment for multiprocessor caches through grey coloring 

      Agarwal, A.; Guttag, J. V.; Hadjicostis, Christoforos N.; Papaefthymiou, M. C. (1994)
      The achieved performance of multiprocessors is heavily dependent on the performance of their caches. Cache performance is severely degraded when data tiles used by a program conflict in the caches. This paper explores ...
    • Conference Object  

      Memory performance of DSS commercial workloads in shared-memory multiprocessors 

      Trancoso, Pedro; Larriba-Pey, Josep-L; Zhang, Zheng; Torrellas, Josep (IEEE, 1997)
      Although cache-coherent shared-memory multiprocessors are often used to run commercial workloads, little work has been done to characterize how well these machines support such workloads. In particular, we do not have much ...
    • Conference Object  

      Performance study of cosmological simulations on message-passing and shared-memory multiprocessors 

      Dikaiakos, Marios D.; Stadel, Joachim (ACM, 1996)
      In this paper we describe PKDGRAV, a parallel hierarchical tree-structured code used to conduct cosmological simulations on shared-memory and message-passing multiprocessors. We explore performance traits of cosmological ...
    • Book Chapter  

      Results of parallel implementations of the selection problem using sisal 

      Daumas, Marc; Evripidou, Paraskevas (Publ by Elsevier Science Publishers B.V., 1993)
      This paper presents an in depth analysis on the parallel implementation of four of the standard selection algorithms using a functional language on a number of multiprocessor and supercomputers. Three of the algorithms: ...
    • Conference Object  

      Thermal-aware scheduling: A solution for future chip multiprocessors thermal problems 

      Stavrou, Kyriakos; Trancoso, Pedro (2006)
      The increased complexity and operating frequency in current microprocessors is resulting in a decrease in the performance improvements. In order to keep up with the expected performance gains, major manufacturers have ...
    • Conference Object  

      Wait-free solvability via combinatorial topology 

      Mavronicolas, Marios (1996)
      This paper addresses the question of whether Algebraic Topology is really necessary for determining the characterization of solvable tasks for the case of general t. A Combinatorial Topology framework, totally bypassing ...